33 research outputs found
The impact of environmental stochasticity on value-based multiobjective reinforcement learning
A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature
Softmax exploration strategies for multiobjective reinforcement learning
Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation
Towards machine learning approach for digital-health intervention program
Digital-Health intervention (DHI) are used by health care providers to promote engagement within community. Effective assignment of participants into DHI programs helps increasing benefits from the most suitable intervention. A major challenge with the roll-out and implementation of DHI, is in assigning participants into different interventions. The use of biopsychosocial model [18] for this purpose is not wide spread, due to limited personalized interventions formed on evidence-based data-driven models. Machine learning has changed the way data extraction and interpretation works by involving automatic sets of generic methods that have replaced the traditional statistical techniques. In this paper, we propose to investigate relevance of machine learning for this purpose and is carried out by studying different non-linear classifiers and compare their prediction accuracy to evaluate their suitability. Further, as a novel contribution, real-life biopsychosocial features are used as input in this study. The results help in developing an appropriate predictive classication model to assign participants into the most suitable DHI. We analyze biopsychosocial data generated from a DHI program and study their feature characteristics using scatter plots. While scatter plots are unable to reveal the linear relationships in the data-set, the use of classifiers can successfully identify which features are suitable predictors of mental ill health
Language representations for generalization in reinforcement learning
The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task. But its relationship with generalization to new tasks is under-explored. One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discreteactions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of languag
An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments
One common approach to solve multi-objective reinforcement learning (MORL)
problems is to extend conventional Q-learning by using vector Q-values in
combination with a utility function. However issues can arise with this
approach in the context of stochastic environments, particularly when
optimising for the Scalarised Expected Reward (SER) criterion. This paper
extends prior research, providing a detailed examination of the factors
influencing the frequency with which value-based MORL Q-learning algorithms
learn the SER-optimal policy for an environment with stochastic state
transitions. We empirically examine several variations of the core
multi-objective Q-learning algorithm as well as reward engineering approaches,
and demonstrate the limitations of these methods. In particular, we highlight
the critical impact of the noisy Q-value estimates issue on the stability and
convergence of these algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:2211.0866
Portal-based sound propagation for first-person computer games
First-person computer games are a popular modern video game genre. A new method is proposed, the Directional Propagation Cache, that takes adavntage of the very common portal spatial subdivision method to accelerate environmental acoustics simulation for first-person games, by caching sound propagation information between portals
Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning
Reinforcement learning is an approach used by intelligent agents to
autonomously learn new skills. Although reinforcement learning has been
demonstrated to be an effective learning approach in several different
contexts, a common drawback exhibited is the time needed in order to
satisfactorily learn a task, especially in large state-action spaces. To
address this issue, interactive reinforcement learning proposes the use of
externally-sourced information in order to speed up the learning process. Up to
now, different information sources have been used to give advice to the learner
agent, among them human-sourced advice. When interacting with a learner agent,
humans may provide either evaluative or informative advice. From the agent's
perspective these styles of interaction are commonly referred to as
reward-shaping and policy-shaping respectively. Evaluation requires the human
to provide feedback on the prior action performed, while informative advice
they provide advice on the best action to select for a given situation. Prior
research has focused on the effect of human-sourced advice on the interactive
reinforcement learning process, specifically aiming to improve the learning
speed of the agent, while reducing the engagement with the human. This work
presents an experimental setup for a human-trial designed to compare the
methods people use to deliver advice in term of human engagement. Obtained
results show that users giving informative advice to the learner agents provide
more accurate advice, are willing to assist the learner agent for a longer
time, and provide more advice per episode. Additionally, self-evaluation from
participants using the informative approach has indicated that the agent's
ability to follow the advice is higher, and therefore, they feel their own
advice to be of higher accuracy when compared to people providing evaluative
advice.Comment: 33 pages, 15 figure
A NetHack Learning Environment Language Wrapper for Autonomous Agents
This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature